Language Models appear to perform poorly on quantification. We ask how badly. 'Few'-type quantifiers, as in 'few children like vegetables' might pose a particular challenge for Language Models, since the sentence components without the quantifier are likely to co-occur, and because 'few'-type quantifiers are rare. We present 960 sentences stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes. Not only do the models perform poorly on 'few'-type quantifiers, but overall the larger the model, the worse its performance. We interpret this inverse scaling as suggesting that larger models increasingly reflect online rather than offline human processing, and argue that decreasing performance of larger models may challenge uses of Language Models as the basis for Natural Language Systems.
translated by 谷歌翻译
Are the predictions of humans and language models affected by similar things? Research suggests that while comprehending language, humans make predictions about upcoming words, with more predictable words being processed more easily. However, evidence also shows that humans display a similar processing advantage for highly anomalous words when these words are semantically related to the preceding context or to the most probable continuation. Using stimuli from 3 psycholinguistic experiments, we find that this is also almost always also the case for 8 contemporary transformer language models (BERT, ALBERT, RoBERTa, XLM-R, GPT-2, GPT-Neo, GPT-J, and XGLM). We then discuss the implications of this phenomenon for our understanding of both human language comprehension and the predictions made by language models.
translated by 谷歌翻译
某些语言允许在某些情况下省略参数。然而,人类语言理解者可靠地推断出这些零代词的预期参考人,部分原因是他们建立了对哪些参考人更有可能的期望。我们询问神经语言模型是否也提取了相同的期望。我们测试了12种当代语言模型是否显示出反映人类行为的期望,这些句子暴露于Carminati(2005)中意大利五个行为实验中的零代词中。我们发现三个模型-XGLM 2.9B,4.5B和7.5B-从所有实验中捕获人类行为,而其他实验则成功地对某些结果进行了建模。该结果表明,人类对核心的期望可以从接触语言中得出,并且还指示了语言模型的特征,使他们能够更好地反映人类的行为。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Grasping is an incredible ability of animals using their arms and limbs in their daily life. The human hand is an especially astonishing multi-fingered tool for precise grasping, which helped humans to develop the modern world. The implementation of the human grasp to virtual reality and telerobotics is always interesting and challenging at the same time. In this work, authors surveyed, studied, and analyzed the human hand-grasping behavior for the possibilities of haptic grasping in the virtual and remote environment. This work is focused on the motion and force analysis of fingers in human hand grasping scenarios and the paper describes the transition of the human hand grasping towards a tripod haptic grasp model for effective interaction in virtual reality.
translated by 谷歌翻译
Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset, code, and models can be found at https://persuasion-deductiongame.socialai-data.org.
translated by 谷歌翻译
The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.
translated by 谷歌翻译
Fruit harvesting has recently experienced a shift towards soft grippers that possess compliance, adaptability, and delicacy. In this context, pneumatic grippers are popular, due to provision of high deformability and compliance, however they typically possess limited grip strength. Jamming possesses strong grip capability, however has limited deformability and often requires the object to be pushed onto a surface to attain a grip. This paper describes a hybrid gripper combining pneumatics (for deformation) and jamming (for grip strength). Our gripper utilises a torus (donut) structure with two chambers controlled by pneumatic and vacuum pressure respectively, to conform around a target object. The gripper displays good adaptability, exploiting pneumatics to mould to the shape of the target object where jamming can be successfully harnessed to grip. The main contribution of the paper is design, fabrication, and characterisation of the first hybrid gripper that can use granular jamming in free space, achieving significantly larger retention forces compared to pure pneumatics. We test our gripper on a range of different sizes and shapes, as well as picking a broad range of real fruit.
translated by 谷歌翻译
The introductory programming sequence has been the focus of much research in computing education. The recent advent of several viable and freely-available AI-driven code generation tools present several immediate opportunities and challenges in this domain. In this position paper we argue that the community needs to act quickly in deciding what possible opportunities can and should be leveraged and how, while also working on how to overcome or otherwise mitigate the possible challenges. Assuming that the effectiveness and proliferation of these tools will continue to progress rapidly, without quick, deliberate, and concerted efforts, educators will lose advantage in helping shape what opportunities come to be, and what challenges will endure. With this paper we aim to seed this discussion within the computing education community.
translated by 谷歌翻译
Question Answering (QA) is a growing area of research, often used to facilitate the extraction of information from within documents. State-of-the-art QA models are usually pre-trained on domain-general corpora like Wikipedia and thus tend to struggle on out-of-domain documents without fine-tuning. We demonstrate that synthetic domain-specific datasets can be generated easily using domain-general models, while still providing significant improvements to QA performance. We present two new tools for this task: A flexible pipeline for validating the synthetic QA data and training downstream models on it, and an online interface to facilitate human annotation of this generated data. Using this interface, crowdworkers labelled 1117 synthetic QA pairs, which we then used to fine-tune downstream models and improve domain-specific QA performance by 8.75 F1.
translated by 谷歌翻译